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ABSTRACT 



Pattern recognition and classification systems have been under de-» 
velopment for several years. This paper examines one of these systems, 
which has been called an adaptive linear neuron, to determine how the 
desired classification is achieved and how this system might be used in 
the practical field of character recognition. Specifically, the follow^ 
ing ideas are discussed in this papers 

(1) The basic concepts of linear separability and iterative adap=> 
tion by an adaptive linear neuron (Adeline), as applied to the 
pattern recognition and classification problem. 

(2) Four possible iterative adaption schemes which may be used to 
train an Adaline. 

(3) Use of Multiple Adelines (Madeline) and two logic layers to 
increase system capability. 

(4) Use of Adaline in the practical fields of Speech Recognition, 
Weather Forecasting and Adaptive Control Systems and the possible 
use of Madeline in the Character Recognition field. 
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1. Introduction 



Systems which can be trained to classify complex digital and analog 
patterns have been under development for several years. One such system 
has been proposed by B. Widrow and others at Stanford University CO. 

In this system, pattern recognition and classification are accomplished 
through the use of integrative memory cells, each of which consists of a 
variable resistor whose value can be made, by suitable ''training", to 
become a useful function of the experiences of the device itself. The 
training of these devices is essentially a process of iterative adaption. 

In general, training consists of a systematic adjustment of each of the 
memory elements of the device in such a way that the system is forced to 
produce a predesignated output to each of a number of specific inputs. 

After the device has successfully adapted to a large number of training 
patterns, it has the ability to classify inputs which are related to the 
training patterns, as well as the training patterns themselves. 
Characteristics of the trained system are the effectively Instantaneous 
classification of pattern inputs and the diffuse storage of information 
throughout the memory elements. The basic system developed by the Stanford 
group is an adaptive threshold device called an adaptive neuron, or 
Adeline, an acronym of adap tive linear neuron. A discussion of the 
functions performed by this element will be presented later. Systems 
containing one or more of these devices are being developed for use in 
speech recognition, weather forecasting and electrocardiogram analysis. 

They have also been used as trainable controllers in adaptive control 
systems . 

In this paper it is planned to review the progress made by the 
Stanford group in the field of pattern recognition and classification. 
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Specifically, a close examination will be made of an Adaline, the basic 
ideas behind its use, the procedure followed in training it and the 
results obtained using a trained system. The investigation will be 
prosecuted experimentally by testing digital computer simulated models. 
In addition, a similar mathematical model will be applied to the 
practical problem of classifying the ten characters zero through nine 
when these are in a hand printed form. Finally, a start will be made on 
the problem of identifying a set of 45 alpha-numeric characters, also in 
hand printed form. 
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2, The Basic Adaptive Neuron. 

The basic adaptive neuron, or Adaline, developed by the Stanford 

group is shown in figure 1. The input p.'s are binary j, but for 

convenience they are required to take the values of +1, rather than 

the more conventional +1 and 0. An input pattern is defined to be a 

particular set of values of the Pj's. In the Adaline each p^, is 

multiplied by a corresponding (analog) weight, w^, which can be re= 

garded as the current setting of the i memory cell, and which can 

take both positive and negative values. The weight w^ is called the 

n 



threshold weight. Thus the analog output of the summer is ^ p.w. + w 

rn -> i=l ^ ^ 

and the digital or quantized output is sgn | 



Pi 

P2 

?3 



n 
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3. Linear Separability. 



For simplicity, a two input Adaline will be discussed first. 
Suppose that it is desired to classify the possible input patterns into 
two classes: 



W 2 “ 1, then the corresponding digital outputs will be +1 and -1. It 
will be said that this setting of the weights has classified the patterns 
into two classes, (a) with a +1 output, and (b) with a »1 output. In 
more complicated cases, the values of the weights would be adaptively 
determined during training. However, is such classification always 
possible? Suppose it is desired to classify the input patterns into 
the two classes: 



In this case it is not possible to find a set of 10 ^ which will yield a 
digital output of +1 for the first class of input and -1 for the second. 



(a) one or both p^ positive, and 

(b) both pj negative. 



In this particularly simple case it is easy to see that if Wq “ “ 



(a) both p. positive, or both p^ negative, and 

(b) Pj's with alternate signs. 




.b 




a . 



a 




Pi + P 2 + 1 “ 0 



a . 




a 



b 




N 



Fig. 2. Separation 
Into Two Output Classes 



Fig. 3. Not Separable 
By One Adaline 
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It will then be said that the patterns are not '’linearly separable" into 
these two classes and that a classification cannot be affected with a 
single Adeline. 

The same thing can be illustrated graphically as in figures 2 and 3. 

n 

The analog output must change sign as the line + w ■ 0 is 

i=l ^ i 0 

crossed, and it can be seen that the analog output above the line in 
figure 2 is positive, and that it is negative below the line. Since a 
line can be drawn (actually an infinite number of lines can be drawn, 
each with different w^^) separating the a and b pattern classes, it can be 
concluded that the patterns are linearly separable into these two 
classes. On the other hand, it is immediately obvious that the classic 
fication of figure 3 is not possible. 

It is of interest to continue the classification of the possible 
input patterns into different groups of two classes until all possible 
combinations have been examined. Using this procedure, both the number 
of linearly separable and the number of not linearly separable examples 
may be determined for a two input Adeline. The digital outputs for the 
two classes will bes 

class (a) +1, and 
class (b) -1. 

Table 1 contains eight examples illustrating this concept of linear 
separability. 

It can be seen that examples 1 thru 6 in the table could be repeated 
with the desired digital outputs reversed, i.e., with a °1 output for 
class (a) and a +1 output for class (b). This procedure would double 
the number of linearly separable classifications, and with the addition 
of the case illustrated as example 7 and its counterpart (for all inputs 
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Table 1. Separable and Not Separable Examples For Two Binary Inputs 


No. 


Desired Classification Graphical One Set of w^ For 

of Input Patterns Representation Linear Separation 


1 


• . 

(a) One or both p. + 

(b) Both pj - 

> 


^ ^ ”o ^ 

— w - 1 

\ cu. W2 = 1 


2 


(a) Both p.*s + and both b • 

p.*s =, p^ + and p^ - 

(b) pj^ - and p^ + .0- 


1 0. * _ J 

^ 0 

— w, = 1 

a ^ W 2 = - 1 


3 


(a) All except in (b) o- 

(b) Both Pj's + "" 

.0-, 


b>* ^ 

Wj - 

a, . «2 ■= -1 


4 


(a) Botn Pj ' s +, d. ^ 

p.'s p^ ■* and p^ + 

(b) p|^ + and p 2 - ,®” / 


0.* = 1 
/f 

= «=1 

b. W2 = 1 


5 


(a) Both Pj's +, pj^ •= and ?2 Q- < 

(b) Both Pj's Pj^ + and p^ » 

.b 


w = 0 

^ ^ A ''O ^ 

"-^h W, = 0 

I 1 

b . W2 = -1 


6 


• jk 1 

(a) Both Pj's +, pj^ and ?2 “ 

(b) Both Pj's ,pj - and p^ + 

.b 


w„-o 

— = 1 
O'. W2 = 0 


7 


(a) All inputs + ^ 

. .. _ \ 


(X w = 1 

0 

— w = ^ 

a.. «2 = ^ 


8 


(a) Both Pj's +, both Pj's b J 

(b) Pj's w/ alternate signs 

Ou 

• 


^ ^ ^ None 

■ ^#5 

b 

• 
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to be classified as (b)), would yield a total of fourteen examples which 
are linearly separable. The interchange of the +1 and "I of example 8 
would yield a second example which is not linearly separable. Thus the 
total number of separable and unseparable classifications into two 
classes is equal to 16. 

Suppose now that the number of binary inputs to an Adeline is 3 
rather than and that it is desired to classify the input patterns 
into the two classes: 



(a) 


Pi “ +1» ?2 “ 


p = +1 


(b) 


Pi “ "1» ?2 “ 


P “ Jl 

F3 



If the weights are chosen to be w, “1 and w^ w w 0, it is 
easily seen that the digital output will be +1 for all the patterns in 
class (a), and for those in class (b). Thus separation has been 
achieved by the plane p^ “ 0. 



Table 2. Number of Linearly Separable Classifications 
for Different Numbers of Binary Inputs 


Number of 
Binary Inputs (n) 


Maximum Possible Number 
2n 

of Examples (2 ) 


Number of Known Linearly 
Separable Examples 


1 


4 


4 


2 


16 


14 


3 


256 


104 


4 


65,536 


1,882 


5 


4.3x10^ 


94,572 


6 


14 

1.8x10 


15,028,134 
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In general, if there are n binary inputs to an Adalinej. the two 
classes will be separable if the w^ can be chosen in such a way that the 
n-1 dimensional hyperplane + ^ 2^2 * * '^n^n ^ ° ^ separates 

the two classes in n-space. 

It was shown above that, for an Adeline with two binary inputs, 
the total number of input>output situations was 16. An extension of 
this reasoning from 2 to n binary inputs indicates that there are 2 
possible examples, again including both linearly and not linearly 
separable types. Table 2, which was extracted from £ 2 ^, lists this 
figure together with the numoer of known linearly separable examples, 
for n from 1 to 6. 
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A. Threshold and Dead Zone. 



Linearly separable input patterns cannot normally oe separated 

unless the weights are carefully chosen. The process of changing the 

weights until each weight has the value most likely to affect the linear 

separation of the input patterns is called adaption or training. 

One of the weights adjusted during adaption is w^, the threshold 

weight. First, in relation to the binary or quantized output, we see 

n 

that the output changes sign when YL Pi'^i “ threshold can 

i=i 

therefore be regarded simply as a bias on the quantizer, as shown in 
figure 4. Alternatively, if we consider the geometry of the hyperplane, 
it can be seen that a change of the threshold from vjq to w^' affects a 
shift of the hyperplane to a new location parallel to its original 
position, as illustrated in figure 5 for n = 2. In fact, the 
perpendicular distance of the hyperplane from the origin, d, i^ill be 

. It should also be noted that the slope of the 

(w^ + W2 ---- w^^ ) 




Quantizer 

Fig. 4. .:,ffect on Quantizer of 
Changes in threshold Weight 
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m 




hyperplane is not a function of the threshold, but instead is determined 
by the settings of the other weights. 




Hyperplane 
for Wrt 



For the situation depicted in figure 6, any of the three hyperplanes 

tl 

0 



determined by the thresholds Wq, w^* and w^” will linearly separate the 



input patterns into the two desired output classes a and b. However, 
use of the Wq’ or Wq*' hyperplanes would not provide the safety that 
might be needed in a practical situation. Some of the factors that 
might cause classification errors are resolution of the quantizer and 
drift in the circuit components. As a result, the input to the quantizer 
must always be sufficiently different from zero so that a small change 
in the quantizer input after adaption will not yield an incorrect digital 
output. In addition, the problem of resolution and drift is magnified 
by the fact that, as the number of binary inputs increases, the tolerance 
required on each weight becomes more critical. This is discussed in 
Now what can be done to decrease the effect of such errors on the 
Adaline system? One technique would be to utilize the Wq hyperplane 
as the separating plane, with the w^’ and w^'* hyperplanes as the outer 
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limits of a buffer zone about the separating plane® The system would 
then be trained so that all possible input patterns would be classified 
into classes above and below the buffer zone^ with a consequent reduc= 
tion of the danger of producing an incorrect output from the quantizer. 
For convenience, each of the spaces on either side of the separating 
plane will be called a "dead zone" as labeled in figure 6. 
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5. Multilevel Adeline. 



It is often necessary to separate the input patterns into more than 
two classes. In some cases this can be done effectively with a multi- 
level Adeline, which separates the input patterns into classes according 
to the analog output (level) assigned to each pattern. For this type of 
separation to be possible the input patterns or sets of input patterns 
must be linearly separable in a specific fashion. The several assigned 
output levels may be regarded as equivalent to a corresponding set of 
threshold weights. These in turn define a set of parallel hyperplanes 
such as shown in figure 6 for Wq, Wq' and Wq"-> Therefore, the use of the 
multilevel Adeline is restricted to the classification of inputs which 
can be linearly separated from each other by a series of parallel hyper- 
planes. 

A typical use of a multilevel Adeline would be to classify the 
written digits one through nine. As a simple test of this concept a 
computer simulated Adaline was used to separate one sample of each of 
these digits. The written digits were converted into patterns using a 
seven by seven input space. After training, all nine patterns were 
correctly classified showing that this particular set of patterns was 
separable in this manner. 
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6. Ada line Adaption. 

The previous sections contained a discussion of Adaline fundamentals 
and the concept of separating pattern classes by the appropriate setting 
of the weight elements. Training or adaption is the process of iter- 
atively determining suitable values for the weight elements. This 
section will formulate the adaption problem and then consider four 
possible adaption or training procedures. 

Adaline application in a pattern recognition problem involves the 
following sequence of events. Firstly» training patterns are chosen and 
assigned digital outputs, i.e. plus or minus one. Then an adaption 
process is selected and Adaline is trained until it either correctly 
identifies all the patterns, or until the training is terminated due to 
a failure of the training scheme to converge. If the training was not 
so terminated, the Adaline would now have the capability to correctly 
identify all the training patterns (thus proving the pattern classes to 
be separable) and in addition, the ability to recognize a number of 
'•similar*' patterns. In fact, one of Adeline's main advantages is the 
ability to identify patterns it has not been trained on. Other methods 
of pattern recognition such as "table look up" do not generally have this 
capability. 

One question now to be answered is, what constitutes a "similar" 
pattern? One measure of the similarity of two patterns would be the 
number of pattern elements that would have to be changed in one pattern 
to make the two patterns identical. Figure J displays three patterns, 
each containing 16 pattern elements. Pattern (a) differs from pattern 
(b) by one pattern element, while pattern (a) differs from pattern (c) 
by eight pattern elements. Patterns (a) and (b) would be classified as 
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"similar" patterns while patterns (a) and (c) would be classified as 
dissimilar patterns if the above criterion were usedo In some cases, 
however, a pattern might be regarded as "similar" to a training pattern 
if it were related to it, for example, by a simple rotation or transla= 
tion. In that case patterns (a) and (c) would be regarded as similar. 
This criterion might be used if the Adaline was being specifically 
trained to recognize patterns even when translated or rotated from their 
normal configuration. 



XXX. 
. X 0 . 

O X . 0 



XXX. 



. X 



XXX 
. X . 



(a) (b) (c) 



Fig. 7. Similar and Dissimilar Patterns 

Finally, a "similar" pattern is often generated by the contamination 
of a training pattern by noise. For this reason "similar" patterns are 
sometimes referred to as "noisy" patterns. The ability of Adaline to 
recognize similar patterns is clearly an extremely complex function of 
the total number of pattern elements in the input pattern, the number 
and complexity of the training patterns, the value of the weight 
elements, and the degree of similarity of the input pattern to a train- 
ing pattern. 

It was shown earlier that the analog output is imposed on a 
quantizer whose digital output is either plus or minus one. It is 
usually desirable to endow the quantizer with a dead zone, so that some 
finite magnitude of the analog output is required before the digital 
output becomes non°zero. During the training or adaption phase the 
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analog output of each training pattern is made equal to or greater than 
an ’’adaption dead zone”o After training,, a smaller value of dead zone 
may be introduced which will often enhance the probability of correctly 
identifying similar patternso Thus patterns will be classified accord- 
ing to the relationship of their analog outputs to a ’’recognition dead 
zone"« 

The adaption or training problem can be formulated ass 

Givens A collection of patterns and the desired digital 
output for each pattern.. 

Finds The value of the weight elements such that the 

n 

expression Wq + ^ p.w. yields the appropriate analog 
i=l ^ ^ 

output for each input pattern. 

The weight values are usually determined by an iterative process 
based on a comparison of the actual analog output for each training 
pattern with the corresponding desired analog output. In other words » 
an input pattern is imposed on Adaline,, the analog output is examined^ 
the weight elements are changed if required » and then another pattern is 
imposed. This process is continued until the analog output of each 
pattern is acceptable without further adaption^ or until it is determined 
that the process will not converge. The adaption process will be defined 
as having converged when each training pattern generates an acceptable 
output. It should be noted that an adaption process may not converge to 
a solution even though the pattern classes themselves are linearly 
separable. In such a case a different adaption technique would have to 
be employed. 
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7. Specific Adaption Schemes. 



Common to each of the adaption schemes discussed below are the 
following rules. 

When an analog output is unacceptable? 

1. Change each of the weight elements by the same absolute 
magnitude. 

2. Increase or decrease each weight according to whether the 
product of the corresponding pattern element by the desired 
change of the analog output is positive or negative. For 
example, if the analog output is to be increased and pj^ 

is =1, then w^ would be decreased. 

A description of the four adaption schemes listed in QT} now follows. 
Minimum Square Error Adaption 

This process adjusts the weight elements in such a manner that the 
analog output for each pattern is driven toward the same absolute magni= 
tude, called the adaption dead zone. The error at each step is defined 
as the difference between the adaption dead zone and the actual analog 
output, and the change made to each weight element is expressed by the 
following equations 

I A 1 “ Error x Proportional Constant 
' Total Number of Weights 

If the proportional constant is greater than zero and less than two, 
the process will converge provided that the patterns are both linearly 
separable and meet an additional requirement that will be discussed 
later. In the case where the weight elements are continuously variables) 
it has been experimentally determined (Appendix Ill)s, that the number of 
iterations required to converge in a typical situation approaches a 
minimum when the proportional constant is approximately equal to one. 
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The criterion requiring identical magnitudes for the analog outputs 
for all input patterns causes an extremely large number of iterationso 
Therefore, a tolerance is usually established around the adaption level, 
defined as minimum square error bound, and no changes are made to the 
weight elements if the analog output falls within this bound* 

This adaption scheme also suffers from the disadvantage that there 
is no assurance of convergence, even if the pattern classes are known to 
be linearly separable* If the number of input patterns is equal to or 
greater than one plus the number of weight elements, there is a possi° 
bllity that the adaption will not converge* The two pattern space, 
considered in section 3, will be used to illustrate this limitation* It 
can be seen in Figure 8 (a) that the classes can be separated in such a 
way that the analog output has the same magnitude for each pattern* This 
is not true in the case of the configuration of Figure 8 (b), and the 
minimum square error procedure, as described above, will not converge 
even though the classes are linearly separable* 

The next three adaption procedures have been proved to converge to 
a solution provided that the pattern classes are linearly separable* 

These schemes compare the adaption dead zone value with the actual analog 
output and, if the magnitude of the analog output is less than the adap= 
tion dead zone value, some method will be employed to adjust the weights 
in such a way as to increase the analog output magnitude* However, if 
the analog output magnitude is equal to or greater than the adaption dead 
zone, no changes are made* 
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X +i Class Patterns 
0 =1 Class Patterns 



n 

”0 * Pi"! 
n 

+ \ p w * 

0 i4i "i i 



+d 



= 0 



JQL 

«0 ^ Pi”i “ 

i»l 1 1 



= <=d 



Fig® 8o Separation With Minimum Square Error Procedure® 
Incremental Adaption 

When the analog output magnitude is less than the adaption dead 
zone value, all the weight elements are changed by an amount: 



Awi 



Incremental Constant x Adaption Dead Zone 
Total Number of Weight Elements 



The number of iterations required to converge, and the final value of the 
weight elements, is a function of the incremental constant# Note that 
the corrections are independent of the errors# 
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Relaxation Adaption 



This procedure is similar to the minimum square error adaption 

except that no corrections are made when the analog output magnitude 

is equal to or greater than the adaption dead zone* The adaption pro® 

cess will converge if the proportional constant is between zero and two* 

The weight elements are changed, if requiredj> by an amounts 

1 A I - Dead Zone ® Analog Output) x Proportional Constant 

l^^il “ Total Number of Weight Elements 

Modified Relaxation Adaption 

A disadvantage of the minimum square error and relaxation adaption 

procedures is the large number of iterations required for convergence* 

When the difference between the desired analog output and the actual 

analog output is small, a small correction is made* Thus, the closer 

the process gets to the solution, the smaller the magnitude of the 

corrective changes* The modified relaxation adaption procedure over® 

comes this difficulty by correcting to a value larger than the adaption 

dead zone. This value, usually 1*1 to 1*5 times the adaption dead zone 

magnitude, is defined as the adaption level* No corrections are made if 

the output magnitude exceeds the adaption dead zone* The equation for 

the change of weight elements, where the proportional constant should 

again be between zero and two for convergance, iss 

1 A w 1 « (Adaption Level ® Analog Output) x Proportional Constant 
1 M Total Number of Weight Elements 
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8. Evaluation of Adaption Procedures- 

The different characteristics of the adaption procedures were in° 
vestigated by the solution of some typical problems- In all cases, 
Adaline was simulated on a CDC 1604 Digital computer using one of the 
adaption procedures which are defined by the Fortran statements listed 
in Appendix I. In one example the adaption process is examined after 
each iteration, (Appendix II), while in another a test is described of 
Adeline's ability to correctly identify a number of patterns upon which 
it had not been trained (Appendix IV). 

If training patterns are not separable, the training process will 
not converge and must be arbitrarily terminated- The Adaline will then 
fail to identify all of the training patterns correctly, but if the 
number of failures is small this may be tolerable in some applications. 
When it comes to the recognition of noisy versions of the training 
patterns, it must be expected that Adaline will only recognize a sta= 
tistical percentage of the similar patterns presented. The only method 
of ensuring that Adaline will correctly identify all possible input 
patterns is to train on all the conceivable patterns. But, this is 
then a form of "table look up" recognition, which can be performed by 
other means without the necessity of employing an iterative scheme- 
Appendix V summarized the results of training Adaline on pattern classes 
that are not separable. 

No attempt will be made to promote the use of one adaptive procedure 
in lieu of the others- It can be noted that the modified relaxation 
procedure usually requires the fewest number of iterations to converge, 
but results in a wide spread in the analog output values- On the other 
hand, the minimum square error adaption requires more iterations to con= 
verge but has a narrow spread in the analog output values. 
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9. Possible Adaline ApplicationSo 

The output of a trained Adaline can be regarded as a binary digit» 
or logical decision^ whose value depends upon the pattern input to the 
device. It followsj, therefore* that an Adaline can* in principle, be 
applied in any situation where a decision is to be made on the basis of 
some "input" to the decision making device. In particular* the Adaline 
concept is valuable in situations where performance can or must be 
improved as experience accumulates. 

The basic difficulties in the application of Adalines relate firstly 
to the problem of converting an input into a pattern* and secondly to the 
development of a suitable training procedure which will ensure that the 
Adaline does in fact improve its performance with practice. The following 
three sections will consider a few of the possible Adaline applications. 
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10. Servo-Mechanism Controller 



A single Adeline with its digital output can be used as a bang-bang 
servo-mechanism controller as in figure 9<. 




Fig. 9. Basic Adeline Servo-Mechanism Controller 

Graduate students at Stanford University have used an Adeline in 
such a control problem. The plant consisted of a rolling cart powered by 
a reversible electric motor. Installed on the cart was an inverted 
pendulum. The Adeline controller was trained to keep the pendulum in a 
vertical position without extreme excursion of the cart in either direc- 
tion. Four plant variables were measured; the position and velocity of 
the cart and of the pendulum. 

The direction the electric motor should rotates, and thus the desired 
Adeline digital output, is a function of the four measured variables. 

The value of each variable can be cataloged into one of several distinct 
levels, and each level can in turn be represented by a code consisting of 
a series of pattern elements. These pattern codes must be carefully 
chosen to ensure that the pattern classes are linearly separable. The 
complete input pattern is composed of the pattern elements of the four 
variables. 

The Adeline is trained by permitting it to observe the performance 
of another type of controller. The ''correct’* response is then available 
at all times and the Adeline weights can be adjusted to bring the Adeline 
output into agreement. After the training is completed, the Adeline can 
take over the operation of the plant. 
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11, Speech Recognition. 

A real time speech recognition system has been constructed at 
Stanford University# The systems, which consists of several parallel 
Adelines^ has the capability of converting speech into typewritten words. 
Since the operation of parallel Adelines will be discussed in a later 
section, only the coding technique will be discussed here. 

The main problem encountered in coding was the choice of parameters 
to describe the sound of a spoken word. Bandpass filters were employed 
to separate the sound energy into eight frequency bands and the sound 
intensity in each band was then digitally coded according to the ampli= 
tude level. Four levels were chosen corresponding to the three bit 
patterns, 000, 001, 011, or 111, where 1 equals +1 and 0 equals “I. 

Ten samples were taken during the utterance of a word, so that each 
filter generated 30 pattern elements. The complete pattern from all 
eight filters therefore consisted of 240 pattern elements. 
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12. Weather Forecasting. 

Adalines have been applied in the area of weather forecasting to 
the extent of predicting "fair'* or **rain** in one locality In 

this application parallel Adalines were trained to interpret weather 
maps. In particulars, they were trained to "read** surface pressure maps 
of a 500,000 square mile area. 

The weather map was divided into 48 regions each of approximately 
600 square miles. Then, the expected range between the highest and 
lowest pressures was divided into ten levels, each of which was repre° 
sented by one of the ten digits, 0 through 9. Thus, each input pattern 
contained 48 pattern elements each of which could acquire one of ten 
values (as compared with the usual two). 

The results obtained from the Adalines were comparable with those 
obtained from "human** weather forecasters. 
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13. Classification When Pattern Classes Are hot Linearly Separable. 

Previous sections have discussed the basic structure of an Adeline 
and have detailed several schemes for the use of an Adaline in pattern 
recognition and classification. The discussion thus far has been limited 
to the separation of sets of input patterns into classes by a single 
Adaline. It was found that the classification of input patterns into 
two classes could not always be affected with a single Adaline. This 
situation was illustrated by the not linearly separable example shown in 
figure 3. In that example it was desired to classify all input patterns 
into the two classes! 

(a) both pj positive, or both p^ negative^ and 

(b) Pj*s with alternate signs 

This separation can be accomplished by a system using two Adelines in 
parallel. The weights of the two Adelines define two hyperplanes, which 
affect the desired separation as illustrated in figure 10. The overall 
system consists of two logic layers 9 the first layer being composed of 
Adelines with adaptive elements and the second layer made up of a fixed 
logic element or threshold device. In the first layer each Adaline 




Fig. 10. Linear Separability 
With Multiple Adelines 
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attempts a classification of input patterns into linearly separable 
classes while the second layer combines the outputs of the first layer 
to complete the desired classification. Using the example of figure 10, 
the inputs would be classified in the first logic layer as follows: 
Adaline One a) both pj^ positive 

b) all others 

Adaline Two a) both p^ negative 

b) all others 

This classification would place the two hyperplanes as in figure 10. 

The next step would be to insert the Adaline outputs into the second 

1 

layer which is an ••or" gate (in this case) in order to realize the 
desired output classification. This process is summarized in table 3. 



Table 3. 


Separation Using Two Logic Layers 




Inputs 


Outputs 




Adaline #1 


Adaline #2 


Or Gate 


Desired 


1 1 


1 


-1 


1 


1 


1 -1 


-1 


-1 


-1 


-1 


-1 1 


-1 


-1 


-1 


-1 


-1 -1 


-1 


1 


1 


1 



It is apparent that the choice of weights (hyperplanes) in the 
first level Adelines is dependent upon the logic device used in the 
second layer and vice versa. It follows that the choice of a logic 



device and the establishment of a training procedure for the Adelines 
may prove very difficult if the result is not known in advance, as it 
was here. In fact this would seem to be the major difficulty in this 

^An ”or'* gate is a device which gives a positive output if one or 
more of its n inputs are positive. 
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scheme. One approach to this problem now follows. 

There are many devices which could be used in the second layer to 
combine the Adeline outputs, so that it is important to estimate which 
of these devices is likely to be the best one for this purpose. The 
previously mentioned •*or” element can be regarded as a special quantizer 
whose output is -1 unless at least one Adeline output is positive. At 
the other extreme would be an element whose output is -1 unless all n 
Adeline outputs are positive. 

In trying to find the "best" quantizing device for use in the second 

logic layer it seems natural to try a device whose output becomes +1 when 

about n of the Adeline outputs become positive. It has been found, [ 7 . , 

2 

that for a system with an odd number of Adelines, a simple output 

majority, or a second layer "threshold" of n+1 , will realize the 

2 

classification of the greatest number of inputs. Similarly for an even 

number of Adalines, second layer thresholds of n or n+2 will realize the 

2 2 

highest percentage of classifications. It should be noted that the 
criterion used in the above reference to determine the "best" second 
layer threshold for general use was the criterion of the classification 
of the maximum number of input pattern sets. However, for any specific 
patterns, or sets of patterns, the threshold which is "best" might be 
anywhere from 1 to n. 

There is still the problem of choosing the "best" adaption scheme 
for the first layer. One method that has been suggested makes use of 
both the analog output data from each Adaline and the digital output 
desired from the overall system. In the case where the second layer 
element is an "or" gate, the procedure to be followed will depend on 
the desired system output. Thus, if the desired system output is -1, 
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all Adalines must give a negative output and any Adaline which is giving 
a positive output will have to be adapted. However, when all first layer 
outputs are negative and the desired system output is +1, only one 
Adaline need by adapted. In this case it has been suggested that 
adaption be confined to the Adaline whose analog output needs the small= 
est change to establish the required condition. If the second layer 
threshold is a majority logic device, the same general procedure will be 
followed. If, to obtain the desired system output, the output of at 
least k Adalines must be revised, the k Adalines whose outputs require 
the least change will be adapted. The idea behind this procedure is that 
adaption should take place with the minimum of disturbance to the pre= 
viously established pattern of weights. 

It may be noted that if it were not for the difficulty in choosing 
a suitable logic for the second layer, and a training procedure for the 
first, it would be theoretically possible to establish any desired 
classification if sufficient Adalines were employed in the first layer. 

In an extreme case, for example, the n+1 hyperplanes defined by the 
weights of n+1 Adalines could separate one input from all others in 
n-space IJE the weights could be chosen properly, and ^ the second logic 
layer properly interpreted the outputs of the Adalines. 
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14. Hacialina. 



Multiple ada lines in parallel, or a Madaline as this coinbination is 
sornetimes called, may also be used to classify a set of input patterns 
into more than two output classes. If the input patterns can oe 
appropriately separated by each of the Adalines, Madaline can accomplish 
the desired classification with the help of a digital output coding 
scheme such as that illustrated in figure 11 and table 4. It is easily 
seen that the maximum number of individual output codes available is 2‘^ 
where m is the number of Adalines in the Madaline. As m = 3 in figure 11 
and table 4, it is readily apparent that eight input patterns or pattern 
sets can be classified into eight different digital output combinations 
in this situation. 

Here the coding scheme is predetermined, so that the training 
follows the procedure devised for a single Adaline. That is, each 
Adaline is trained individually to generate the appropriate response to 
each of the training patterns. 

The problem which may arise is related to the choice of codes for 
the several pattern sets. If these are not properly chosen, then the 
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individual Adalines may well be attempting to separate the patterns into 
classes which are not linearly separable. A change in the codes may 
eliminate the problem in any given situation, but no systematic pro- 
cedure for choosing the coding scheme has been found. 

Also to be considered is the fact that the coding scheme may affect 
the number of Adalines needed, and/or the number of adaptions necessary 
during training. These considerations, however, seem to be less im- 
portant than that of the previous paragraph. 

For the example of figure 11 all eight possible binary output codes 
are used. Note, hot^ever, that there are many possible choices for the 
code to be assigned to each pattern or pattern set. One such choice is 
shown in table 4. 



Table 4. Typical Madaline Output Coding Scheme 




Adeline 1 
Output 


Adaline 2 
Output 


Adaline 3 
Output 


pattern 1 


1 


-1 


1 


pattern 2 


-1 


1 


-1 


pattern 3 


1 


-1 


-1 


pattern 4 


-1 


1 


1 


pattern 5 


1 


1 


“ 1 


pattern 6 


-1 


-1 


1 


pattern 7 


1 


1 


1 


pattern 8 


-1 


-1 


-1 



To summarize, each of the Adalines is trained to separate the input 
patterns into two classes (a) +1 output and (b) -1 output. Then, as an 
input pattern is imposed on the Madeline, the Adalines simultaneously 
determine their output responses. The outputs can then be examined to 
properly classify the imposed pattern. The classification of the 
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written digits, one through eight, suggested herein was successfully 
accomplished using three computer-simulated seven by seven Adelines. 
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15. Character Recosnition 

In this section the application of a Madaline to the recognition and 
classification of hand printed letters, numbers and other punctuation 
symbols will be discussed. For definiteness, the alpha-numeric character 
set chosen corresponded to that used in Fortran computer coding, and the 
aim was to evaluate the possibility of ••reading" hand printed Fortran 
symbols using a Madaline. 

The investigation was divided into two parts. The first consisted 
of an evaluation of the feasibility of classifying the ten numeric 
characters using a Madaline, while the second consisted of an attempt 
to classify 45 Fortran characters using a similar device. 

In the first part of the investigation the Madaline consisted of 
four Adelines, the minimum number required, each with a seven by seven 
input space. The first tests were conducted using patterns of ten digits 
obtained from five different persons, each of whom wrote one sample of 
each of the ten digits on a standard Fortran coding form. The written 
digits were enlarged by projection on a screen and a seven by seven grid 
form was used to determine the actual input patterns used. Each test 
digit was centered on the seven by seven grid form and a digit X7as de- 
fined to be ••in" a grid space if it entered in such a manner that both 
sides of the line could be seen Inside that space. The tests, which are 
detailed in Appendix VI, were conducted using the patterns obtained in 
this manner. In one of these tests, the Madaline was trained four times 
on each of the fifty different input patterns and was then asked to 
classify each input pattern. All input patterns were classified 
correctly. 

The second part of this investigation was again conducted with the 
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minimum number of Adelines required. Six Adelines, eech V7ith e seven by 



seven input spece, were used to clessify the 45 Fortren symbols, 
results of this clessificetion ere deteiled in Appendix VII, 



The 
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16. Summary and Acknowledgements. 

This paper contains a survey of the pattern recognition problem as 
approached through the application of adaptive linear neuron devices. 

Some of the basic Adaline concepts were verified experimentally and some 
concepts were amplified. In particular* an example was shown of the 
inability of the minimum square error adaption procedure to separate in<° 
put patterns that were linearly separable* and a method was developed 
to display the convergence characteristics of the individual adaption 
procedures. In addition* preliminary tests to determine the feasibility 
of utilizing a Madaline for character recognition indicated that a pos» 
sible application exists in this field. 

An interesting project* which is a natural follow up of the work de° 
tailed herein* would be that of using an actual laboratory set up to more 
completely determine the feasibility of utilizing Madaline for character 
recognition. Perhaps this could be accomplished using photocells for 
input pattern detection. Another carry over project using Adelines would 
be to continue the weather forecasting analysis previously discussed (j0. 
The wealth of weather data available at this location makes this a par^ 
tlcularly feasible project. 

The authors wish to acknowledge the guidance and assistance given 
to them by Dr. J. R. Ward. 
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APPENDIX I 



FORTRAN STATEIvIENTS OF ADAPTION PROCEDURES 
The Fortran statements used to simulate the four adaption procedures 
are listed below. The program was written for an Adaline having 17 
weight elements. It is to be noted that this is not the complete program 
but only the adaption techniques. 

Abbreviations used in the program; 



ADAPT 


Adaption level 


BETA 


Increment constant 


BMSE 


Bound for minimum square error 


DELTA 


Dead zone level 


PAT(I,J) 


Input pattern, I=pattern element, J=input pattern 
number 


PIE 


Proportional constant 


SGN(J) 


Desired binary output of pattern J 


SUM 


Analog output of pattern J 


WEY(I) 


Value of weight element I 



C MINmUli SQUARE ERROR ADAPTION 
ERROR=SGN ( J ) *DELTA - S UI'l 
A3ER=ABSF( ERROR) 

IF(ABER-BMSE) 100»100»50 
50 ENORM“PIE*(ERROR/17.0) 

DO 60 1= 1,17 

60 WEY(I)=WEY(I)+ENORM*PAT(I,J) 
100 CONTINUE 
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C INCRE.M2NTAL ACfe\PTION 

IF(SGN(J)) 256, 254,254 

254 IF(SUM) 258,258,255 

255 IF(SUM=DELTA) 258,262,262 

256 IF(SUM) 257,259,259 

257 IF(SUM+DELTA) 262,262,259 

258 SIGN=+loO 
GO TO 260 

259 SIGN=°1<.0 

260 DO 261 1=1,17 

261 WEY ( I) =WEY ( I) +S IGN*B ETA*PAT ( I , J) 

262 CONTINUE 

C RELAXATION ADAPTION 

IF(SGN(J)) 306,304,304 

304 IF(SUM) 310,310,305 

305 IF(SUM+o0005=> DELTA) 310,312,312 

306 IF(SUM) 307,310,310 

307 IF(SUM-.0005+DELTA) 312,312,310 

310 ERROR=SGN(J)*DELTA»SUM 
ENORM=PIE* ( ERROR/ 1 7 o 0) 

DO 311 1=1,17 

311 WEY(I)=WEY(I)+EN0RM*PAT(I,J) 

312 CONTINUE 
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C MODIFIED RELAXATlor,' ADAPTION 
IF(SGN(J)) 356,354,354 

354 IF(SUll) 360,360,355 

355 IF(SUJI-DELTA) 360,362,362 

356 IF(SUl-l) 357,360,360 

357 IF(SUM+DSLTA) 362,362,360 

360 ERROR=3GN(J)*AD^iPT-SUM 
EN0RM=P I E*( ERROR / 1 7 . 0 ) 

DO 361 1=1,17 

361 WEY ( r ) =WE Y ( I) +EKORM*PAT ( I , J) 

362 CONTINUE 
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APPENDIX II 



e)C'lmples of ad/\ption procedures 

An example of pattern separation was solved using each of the 
four different adaption procedures to illustrate their characteristics. 
The test patterns were: 



(+ 1 ) 




. X . X 
• . X . 
. X . X 

9 » • O 

( 2 ) 



X . X . 

a X . a 

X . X 9 

(3) 



0 0 0 0 



. X . X 



« 



, X . X 

(4) 



(- 1 ) 



e o X • 

X . X » 
XXX, 
« • 0 0 

(5) 



0 0 0 X 
0 X 0 X 
0 X X X 
0 0 0 0 
( 6 ) 



0 a X 0 
X 0 X 0 
X X X 0 



(7) 



0 0 O 0 

0 0 0 X 



X 0 X 
XXX 
( 8 ) 



When a pattern is imposed on an Adaline, corrections are made to the 
weights so that its analog output satisfies the training criteria. This, 
however, has a tendency to partially destroy some of the effects of 
previous training. In an attempt to illustrate this process, the analog 
output of the Adaline was examined after each adaption throughout this 
test. 



The fixed conditions of this experiment were: 

Weights: Continuously variable with initial values of 0«0 

Pattern sequence? 1, 2, 3, 4, 5, 6, 7, 8 

Proportional constants 1,00 

Minimum square error adaption levels 3 O 0 OO 

Minimum square error adaption bounds + 1.00 

Incremental constants 1,00 

Adaption dead zones 30.00 

Adaption levels 40.00 
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Minimum Square Error Adaption 



Number 


of iterations 


required to converges 


64 


Weight 


elements after 


convergence. 




3.436 


7.864 


10.197 -5.141 


-5.773 




-.725 


9.090 .228 


-11.528 




-17.821 


-12.380 -16.595 


-15.091 




-.400 


-3,237 -9.718 


4.442 



Analog output of 



each pattern after 
Pattern 
1 
2 

3 

4 

5 

6 

7 

8 



convergence • 

Analog output 
29.819 
29.385 
29.031 
29.368 
-29.846 
-29.529 
-30.063 
-30.000 
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Incremental Adaption 



Number 


of iterations required 


to converges 82 


Weight 


elements after convergence. 


4.000 


10.000 14.000 


-6.000 -6.000 




-2.000 10.000 


.000 -14.000 




-22.000 -14.000 


-22,000 -16,000 




-2.000 -10.000 


-12.000 6.000 


'3 output of each pattern after 


convergence. 




Pattern 


Analog output 




1 


30.000 




2 


46.000 




3 


30.000 




4 


30.000 




5 


-42.000 




6 


-34.000 




7 


-46.000 




8 


-30,000 
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Relaxation Adaption 



Number 


of iterations 


required to converges 


168 


Weight 


elements after 


convergence. 




3.540 


7.881 


10.304 -5.277 


-5,918 




-.730 


9.141 .285 


-11,706 




-18 . 106 


-12.534 -16.846 


-15,344 




-.401 


-8.312 -9.827 


4.516 



Analog output of each pattern 
Pattern 
1 
2 

3 

4 

5 

6 

7 

8 



after convergences 

Analog output 
30.000 
30.000 
30.000 
30.000 
-30.000 
-30.000 
-30.000 
-30.000 



Notes The criterion of the relaxation adaption procedure is that the 
magnitude of , the analog output for each pattern be equal to or greater 
than the adaption dead zone. In this particular example the magnitude 
of each pattern analog output after training was exactly equal to adap- 
tion dead zone. This would not normally be expected . 
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Modified Relaxation Adaption 



Number 


of iterations required 


to converges 31 


Weight 


elements after convergence. 


3.896 


9.918 12.819 


-5.197 -6,787 




-1.319 9.970 


,057 -13,871 




-20.536 -14.237 


-20.485 -16.814 




-1.542 -9.604 


-11.569 5.735 


)g output of each pattern after 


convergence. 




Pattern 


Analog output 




1 


34,697 




2 


37.435 




3 


30.138 




4 


30.844 




5 


-36.076 




6 


-37.091 




7 


-40.000 




8 


-33.365 
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APPENDIX III 



MINIMUM SQUARE ERROR ADAPTION AS A FUNCTION OF PROPORTIONAL CONSTANT 



An investigation of minimum square error adaption was conducted to 
determine the effect of the proportional constant. The number of iter<= 
at ions required for convergence and the resultant value of the weight 
elements were examined. Fixed conditions of the experiment weres 
Adaption levels 30.00 
Bound for minimum square errors +1.00 

Weights! Continuously variable with initial values of 0.0 

Pattern sequences! A. 1, 2, 3, 4, 5» 6, 7^ 8 

B. I 9 5^ 2 j) 6 ^ 3> 7j 4j 8 

C. Random 



Training patterns 

X & J X 

(+1) X 



. X 

X . 

. X 



e o o ft 

( 1 ) 

. . X . 
X . X . 



. X . X 
« e X . 
• X . X 
.... 



( 2 ) 



. o . X 
. X . X 



0 0 0 . 
X 0 X . 
0 X . . 
X 0 X . 
(3) 

e o 0 0 



ft 0 o ft 

ft X ft X 
ft ft X ft 
ft X ft X 
(4) 

0 0 0 0 
00.x 

X 



(-1) 


XXX. 


0 X X X 


X . X 0 


0 X 0 




ft ft 0 ft 


ft ft 0 ft 


X X X 0 


0 X X 




(5) 


(6) 


(7) 


(8) 


C & T 


XXX. 


0 X X X 


ft ft • 0 


XXX 




X ft ft ft 


e ft 0 X 


X 0 0 X 


X ft ft 


(+1) 


X ft ft ft 


ft ft ft X 


X 0 0 X 


X ft 0 




X X X ft 


ft X X X 


X X X X 


ft 0 0 




(1) 


(2) 


(3) 


(4) 



X 

X 



(= 1 ) 



X X 
X . 
X . 
X 0 
(5) 



ft ft ft X 

X X X X 

ft ft ft X 

0 ft ft ft 

( 6 ) 



0 0 ft ft 

X ft ft ft 
X X X X 
X 0 0 ft 
(7) 



ft X e 
ft X ft 
ft X ft 
XXX 
( 8 ) 
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Training patterns 



X & 0 

(+ 1 ) 



(- 1 ) 

U & T 

(+ 1 ) 



(- 1 ) 



X , X . 
o X o « 
X , X . 

O 0 « o 



X . . X 
X « « X 
X . o X 
X X X X 
( 1 ) 

X X X X 
.XX. 
.XX. 
.XX. 
(5) 



. X . X 

• 0 X 0 

• X o X 

• 000 



X X X X 
X O 0 0 
X O 0 0 
X X X X 
( 2 ) 

... X 
X X X X 
X X X X 
. . o X 

( 6 ) 



0 0 0 0 
X . X 0 
0 X . . 
X . X . 



X X X X 
X . . X 
X , . X 
X . . X 
(3) 

.XX. 
.XX. 
.XX, 
X X X X 
(7) 



O O O 0 

, X . X 
. 0 X 0 
. X . X 



(1) 


(2) 


(3) 


(4) 


XXX, 


0 X X X 


0 0 0 0 


0 0 0 0 


X 0 X 0 


. X 0 X 


XXX. 


0 X X X 


XXX. 


.XXX 


X 0 X 0 


0 X 0 X 


0 0 0 0 


O 0 O 0 


XXX, 


.XXX 


(5) 


(6) 


(7) 


(8) 



XXX 
0 0 0 
0 0 6 
XXX 
(4) 

X 0 . « 
X X X X 
X X X X 

X o o o 

( 8 ) 
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X X X X 



I 



« 



I 



I 



i- 










• * I 






M 



M 





0 ' 



NUMBER OF ITERATIONS REQUIRED TO CONVERGE FOR MINIMUM 

ERROR ADAPTION 



PROPORTIONAL CONSTANT 



PATTERN 


SEQUENCE 


.25 


o 

o 


.75 


1.00 


1.25 


1.50 


1.75 


1.90 


2.00 


X&J 


A 


434 


188 


104 


64 


42 


72 


162 


455 


■rff 




B 


423 


182 


109 


66 


49 


71 


162 


434 


* 




C 


440 


200 


136 


96 


72 


99 


160 


392 




C&T 


A 


143 


68 


37 


47 


35 


52 


132 


353 






B 


149 


68 


46 


49 


41 


49 


123 


343 


☆ 




C 


139 


72 


56 


40 


72 


80 


168 


383 




X&O 


A 


297 


129 


73 


45 


45 


69 


122 


360 






B 


300 


144 


84 


54 


47 


66 


145 


367 


* 




C 


312 


152 


96 


72 


72 


85 


143 


366 


* 


U&T 


A 


96 


55 


35 


27 


37 


58 


92 


259 


ic 




B 


97 


51 


35 


31 


45 


60 


149 


397 


ic 




C 


104 


56 


40 


45 


40 


72 


120 


280 


* 



* denotes nonconvergence 
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RESULTANT WEIGHT ELEMENTS AFTER CONVERGENCE FOR X & J PATTERNS 



PROPORTIONAL 

CONSTANT 


PATTERN 

SEQUENCE 




WEIGHT VALUES AFTER 
CONVERGENCE 




.25 


A 


3.454 


7.661 


10.010 


-5.114 


-5.758 








= . 737 


8,868 


,265 


-11,380 








ol7«576 


-12.174 


-16.368 


-14.892 








-.425 


—8.074 


-9.559 


4.389 


1.00 


A 


3.436 


7,864 


10,197 


-5.141 


-5.773 








-.725 


9.090 


,228 


-11.528 








-17.821 


-12,380 


-16,595 


-15,091 








-.400 


-8.237 


-9.718 


4.442 


1.90 


A 


3.519 


7,936 


10.373 


-5.306 


-5.979 








-.859 


9.161 


.202 


-11.658 








-18.137 


-12.682 


-16,912 


-15.342 








-.448 


-8.136 


-0.811 


4.694 


.25 


B 


3.446 


7.622 


9.991 


-5,137 


-5.742 








-.713 


8.869 


.285 


-11.352 








-17.575 


-12.162 


-16.328 


-14.895 








-.394 


-8,059 


-9.548 


4.380 


1.00 


B 


3.406 


7.670 


10,244 


-5.334 


“5.698 








— .663 


9.199 


00 

00 


=11,464 








-17.932 


-12.542 


-16.403 


-15,285 








-.259 


-8.121 


-9.846 


4.478 
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PROPORTIONAL 

CONSTANT 


PATTERN 

SEQUENCE 




WEIGHT VALUES AFTER 
CONVERGENCE 




1.90 


B 


3.569 


7.950 


10.375 


=5.356 


-6.014 








« ®920 


9.144 


.220 


-11.739 








«=»18® l60 


-12.729 


=16.966 


-15,378 








o«4l8 


-8.154 


-9,782 


4,650 


.25 


C 


3.426 


7*672 


10.099 


-5.116 


-5.740 








0 

o 

o 


8.878 


,259 


-11.357 








-17.573 


-12.172 


-16.367 


-14.878 








-.389 


-3.063 


-9.527 


4.394 


1.00 


C 


3.467 


7.803 


10,130 


=5.193 


-5,853 








— .674 


9.104 


,290 


-11,560 








-17.826 


-12.455 


-16.525 


-15.247 








-.310 


-8.210 


-9.705 


4.423 


1.90 


C 


3.657 


7.762 


10.126 


-5.357 


-5.943 








-.775 


9.026 


.270 


=11,689 








-18.095 


-12.358 


-16.831 


-15,239 








-.394 


-3.358 


-9.690 


4,380 
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COMMENTS 



1* The proportional constant strongly affects the number of iterations 
required to converges A minimum number of iterations vas required when 
the proportional constant was approximately equal to ones The adaption 
process will not converge if the proportional constant is equal to two» 

2* For each pattern pair, the final values of the weight elements were 
approximately the same regardless of the proportional constant chosen 
or the sequence of the training patterns* Only a representative sampling 
of resultant weights are Included in the data, but all the results sup° 
ported the above conclusion* 
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APPENDIX IV 



ADALINE RESPONSE TO SmiLAR PATTERNS 



An Adaline was trainedj, until it had converged to a solution^ on the 
ten patterns shown below* After convergences, 50 "similar" patterns were 
imposed upon the trained Adaline* the "similar" patterns being generated 
by randomly changing one pattern element of the training patterns* This 
test was conducted using each of the four adaption procedures in turn* 
Training Patterns 



+1 Class 



X . * X * 

.XX*. 
.XX.. 
X , . X . 



( 1 ) 




0 o o o o 

X . . X . 
.XX.. 
.XX.. 
X , . X . 
(3) 



’1 Class 




.XX. 
.XX. 
X 0 0 X 
(4) 




• • 0 X • 


9 O 9 9 X 


0 9 0 0 0 


9 0 0 0 


0 


e • 0 X • 


9 9 9 9 X 


0 . 0 X 0 


9 0 0 0 


X 


X • 0 X 0 


9 X 9 9 X 


0 . 0 X 0 


0 9 0 0 


X 


X X X X • 


9 X X X X 


X . . X . 


. X . 0 


X 


• • 9 • 9 


9 0 0 9 0 


X X X X . 


.XXX 


X 


(6) 


(7) 


(8) 


(9) 




The fixed 


conditions of the 


experiment were* 







0 0 0 0 



X X X X 

( 10 ) 



X 

X 

X 

X 

X 



Weights: Continuously variable with initial values of 0.0 



Pattern sequence: 1* 2, 3, 4, 5, 6, 7, 8, 9, 10 



Proportional constant: 1.00 



Minimum square error bound: t 1*00 

Adaption dead zone: 30.00 



Incremental constant: 1.00 



Adaption level: 40*00 
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Adaption 

Procedure 


Number of 
Iterations 


Range of Analog 
Output for Train- 
ing Patterns 


Range of Analog 
Output for Similar 
Patterns 






+ 




+ 




Minimum 

Square 

Error 


103 


30»128 

29.587 


29.076 

30.837 


35.657 

6.508 


19.268 

38.684 


Incre° 

mental 


29 


62.000 

36,000 


36.000 

52.000 


58.000 

18.000 


18.000 

64.000 


Relaxation 


99 


42.274 

30.000 


30,000 

38.701 


43.564 

17.171 


17.099 

49.652 


Modified 

Relaxation 


19 


52.642 

30.534 


35.249 

46.580 


55.708 

17.156 


17.367 

57.823 
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If 



COMMENTS 



The range of the analog outputs for tne similar patterns is de° 
pendent upon the final trained values of the weight elements# In this 
example, one pattern element was changed in the generation of each simi° 
lar pattern and this had the effect of changing the sign of one corres° 
ponding weight element# Thus the largest change, in this case, would 
occur when the pattern bit corresponding to the largest weight element 
is changed in sign# 

All of the analog outputs of the similar patterns were of the same 
sign as the desired binary output# Hov;ever, if similar patterns had 
been formed by changing more than one pattern element, it is possible 
that some of the similar analog outputs i7ould be of a different sign 
than the desired pattern binary output# 
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APPENDIX V 



THE RESPONSE OF AN ADALINE TO TRAINING PATTERNS THAT ARE NOT LINEARLY 

SEPARABLE 

Tvjo pattern classes were chosen so that they were not linearly s©pa<= 
rableo This was accomplished by assigning the same pattern to both the 
(+1) and the (=>1) class. There were a total of 100 inpat patterns^ each 
of which was composed of nine pattern elements. The training patterns 
used in this experiment are tabulated in j^l]]* 

The experiment consisted of two parts ^ the first consisting of an 
attempt to separate the two pattern classes by imposing all the training 
patterns on the Adaline. This was performed for both 500 and 5000 iter° 
ations. Seconds a random sample of ten patterns was chosen and Adaline 
was trained on these patterns for 200 iterations or until it had cowerged 
to a solution. Then the remaining 90 patterns were imposed upon the 
trained Adaline. The fixed conditions for this experiment are the same 
as those listed in Appendix IV. 

The results are listed in the following table® For this experiments, 
a pattern was defined to be not correctly identified if its analog output 
was either of an opposite sign to the desired pattern binary outputs, or 
if the analog output was zero. 
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Number of Patterns Not Correctly Identified 





Adaption Procedures 


Number of 

Training 

Patterns 


Number of 
Iterations 


Mlno Sqo 
Error 


Inco 


Relaxation 


Modified 

Relaxation 


100 


500 


13 


4 


7 


7 


100 


5000 


13 


2 


5 


5 


10* 


200 


11 


9 


9 


9 


10* 


200 


14 


10 


10 


10 


10* 


200 


14 


9 


10 


9 


10* 


200 

1 


22 

i 


11 


11 


7 

1 .. . 



* Different random samples of ten patterns » 
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APPENDIX VI 



NUMERIC RECOGNITION TESTS 



A computer simulated Madaline consisting of four Adalines was em° 
ployed to attempt the separation of the ten numeric characters into the 
ten different classes, zero through nine« Ten hand written digits (one 
set) were obtained from each of five persons » and were converted into 
patterns in a seven by seven input spaceo The computer program was 
written in Fortran and executed on a CDC 1604 digital computer using the 
Minimum Square Error adaption scheme with a proportionality constant of 
one and an adaption dead zone of 30» The following output coding scheme 
was used in this investigations 

Digit Adaline 1 Adaline 2 Adaline 3 Adaline 4 



0 1 1 



1 



I 



1 °l 



“1 



°1 



1 



2 1 =1 



= 1 



3 

4 

5 

6 

7 

8 
9 



A series of tests were run in which one$ two or more sets of 
patterns were used to train the Madaline;, after which the Madaline was 
asked to recognize and classify all fifty input patternso The number 
of training iterations performed prior to the classification checks and 
the number of digits recognized out of the total of fifty imposed on 
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I 




C2> 



IjM « 







the system are included in the data tables below. 



Noo of patterns 
trained on 

. 


NOo of training 
iterations 


No® of patterns 
checked 


Uoo of patterns 
classified 


1 set of 10 


4 iterations 
per pattern 


50 


36 


2 sets of 10 


4 iterations 
per pattern 


50 


36 


3 sets of 10 


4 iterations 
per pattern 


50 


44 


4 sets of 10 


4 Iterations 
per pattern 


50 


45 


3 sets of 10 


4 iterations 
per pattern 


50 

1 


50 



No® of patterns 
trained on 


No® of training 
iterations 


No® of patterns 
checked 


1 1 

No® of patterns 
classified 


1 set of 10 


200 total 
iterations 


50 


3? 


2 sets of 10 


200 total 
iterations 


50 


38 


3 sets of 10 


200 total 
iterations 


50 


44 


4 sets of 10 


200 total 
iterations 


50 


44 


5 sets of 10 


200 total 
iterations 


50 


50 



The results of this test show that the system must be trained on 



all patterns to insure that it will be capable of classifying all the 
patternSs Howevers, training on just one set gave the system the capa® 
bility of recognizing many of the other patterns submitted t© it for 
classifications Of interest is the suggestion that repeated iterations 
do not necessarily improve the ability of the system to classify the im=> 
posed inputs o 
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APPENDIX VII 



FORTRAN SYMBOL CLASSIFICATION 

The sole purpose of this test was to determine whether or not 45 
hand printed Fortran symbols could be separated by a computer simulated 
Madeline consisting of the minimum number of Adelines that could theo» 
retically accomplish this task* Six Adalines, each with a seven by 
seven input space^ xvere utilized to accomplish the desired separation 
after the hand written characters had been converted into input patterns* 
The computer program was written in Fortran and executed on a CDC 1604 
digital computer using the Minimum Square Error adaption technique* A 
proportionality constant of one and an adaption dead zone of 30 were 
used for this test* 

After extensive manipulation of the Madeline output coding schemes;, 
the 45 characters were properly classified* Nine thousand training 
iterations were executed prior to the classification of the 45 input 
patterns* The 45 Fortran symbols and the output coding scheme which 
accomplished the desired classification are as follows s 



Pattern 


Adeline 

One 


Adeline 

Two 


Adeline 

Three 


Adeline 

Four 


Adeline 

Five 


Ada line 
Six 


0 


-1 


1 


1 


1 


<=1 


1 


1 


1 


1 


1 


1 


“1 


1 


2 


= 1 


1 


■=■1 


1 


= 1 


1 


3 


= 1 


1 


■=1 


-1 


1 


1 


4 


1 


1 


1 


1 


1 


“1 


5 


1 


1 


1 


= 1 


“ I 


“1 


6 


“1 


1 


= 1 


1 


1 


1 


7 


1 


1 


1 


o 1 


= 1 


1 
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I 



I 



Pattern 

8 

9 

A 

B 

C 

0 

E 

F 

G 

H 

I 

J 

K 

L 

M 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

w 

X 



Adeline 

One 


Adeline 

Two 


Adeline 

Three 


1 


1 


= 1 


1 


1 


“1 


“1 , 


1 


1 


1 


” J 


1 


• 1 


1 


1 


1 


= 1 


1 


1 


= 1 


1 


1 


“1 


1 


= 1 


1 


°1 


1 


“1 


1 




1 


1 


1 


1 


®1 


1 


“1 


1 


1 


“1 


1 


1 


1 


1 


1 


• 1 


-1 


1 


1 


1 


1 


-1 


1 




1 


“1 


1 


1 


1 


<■1 


1 


1 


1 


1 


1 


1 


• 1 


= 1 


= 1 


1 


“1 


® 1 


1 


1 


® 1 


1 


1 



Adeline 

Four 


Adeline 

Five 


Adeline 

Six 


1 


1 


“ 1 




1 


1 


= 1 


1 


1 


1 


1 


1 


°1 


-1 


1 


1 


“1 


1 


1 


1 


= 1 


= 1 


1 


= 1 


1 


“1 


“ 1 


“1 


° 1 




1 


1 


1 


»1 


1 


1 


1 


°1 


=>1 


=>1 




1 


1 


“1 


°1 


1 


=1 


1 


1 


1 


1 


= 1 


I 


1 


1 


1 


= 1 


= l 


1 


= 1 


1 


1 


°1 


=>1 


1 


1 


1 


1 


<=1 


“1 


»1 


= 1 


= 1 


1 


=• 1 


= 1 


“1 


“1 
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Pattern Ada line Ada line 

One Two 


Ada line 
Three 


Adeline 

J^ur 


Adeline 

Five 


Ada line 
Six 




Y =1 


1 


1 


1 


=>1 


“1 




Z -1 


® 1 


1 


= 1 


=■1 






+ 1 


1 




1 


1 


1 




ca ^ X 




1 


“1 


1 


1 




/ ”1 


° 1 


1 


1 


<=1 


GO ^ 




( 


-1 


1 


“1 


1 






) “1 


O) ^ 


1 


1 


=>1 


1 








“1 


»1 


1 


= 1 




0 “1 


-1 


= 1 


“1 


°1 


1 




“ «x 


■=1 


“1 


i 


“1 


1 




Vc a , J 


=1 


° 1 


1 


1 


1 




The final output coding scheme was obtained mainly by trial i 


and 


error methods » However, an effort was made 


to use 


certain characteristics 


of 


the individual 


input patterns 


to facilitate the 


desired classification. 


Specifically, at least one Adeline was trained to produce the same output 


for each of the following sets of 


input patterns § 






a) 


Patterns with 


a long vertical 


line in the left 








hand column<> 








Example; 


E,L,P 


b) 


Patterns with 


other long vertical lines 


• 


Examples 


4,I,T 


c) 


Patterns with 


a long horizontal lineo 




Examples 


E,H,I 


d) 


Patterns with 


large circles 






Examples 


0,Q 


e) 


Patterns with 


small circles 






Examples 


8d9,P 


f) 


Patterns with 


small horizontal lines 




Examples 


A,+, ° 


8) 


Patterns with 


small vertical 


lines 




Examples 


5,+ 


h) 


Patterns with 


left to right slant lines 




Examples 


N,V,X 
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i) Patterns with right to left slant lines 

j) Patterns which were smaller than the others 



Examples l<i,Z,/ 
Examples 
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